Mapping of Sequence Reads to the Reference Genomes ◾ 75
In the following, we will use “bwa aln” to perform the first step of the alignment. Run
“bwa aln” without any option on the command line to learn more about the usage and
options. If the quality of reads at the 3′-end is low, we can use the “-q” option with this
command to specify a quality threshold for read trimming down to 35 bp. Run the follow-
ing commands while you are one step out of the “refgenome” and “data” directories:
bwa aln \
refgenome/GRCh38.p13_ref.fna \
data/SRR769545_1.fastq.gz \
> data/SRR769545_1.sai
bwa aln \
refgenome/GRCh38.p13_ref.fna \
data/SRR769545_2.fastq.gz \
> data/SRR769545_2.sai
Then, we can use “bwa sampe” to generate the SAM file for the alignments.
$ bwa sampe \
refgenome/GRCh38.p13_ref.fna \
data/SRR769545_?.sai \
data/SRR769545_?.fastq.gz \
> sam/SRR769545_aln.sam 2> sam/SRR769545_aln.log
2.3.2.2 Bowtie2
Bowtie2 is an aligner that uses BWT and FM-index as data structures for indexing the
reference genome. It is an ultrafast, memory-efficient short read aligner, and it allows
mapping millions of reads to a reference genome on a typical desktop computer. Bowtie2
is the next generation of the original Bowtie which requires the reads to have equal length
and it does not align reads with gaps. Bowtie2 was developed to overcome those limita-
tions. It performs read mapping in four steps: (i) extraction of seeds from the reads and
their reverse strands, (ii) using FM-index for exact ungapped alignment of the seeds, (iii)
sorting the alignments by scores and identifying the alignment position on the refer-
ence genome from the index, and (iv) extending seeds into full alignments using paral-
lel dynamic programming [16]. Bowtie2 can be installed on Linux with the following
commands:
git clone https://github.com/BenLangmead/bowtie2.git
cd bowtie; make
Then, you need to set Bowtie2 path so that you can run it from any directory by editing
“.bashrc” file from your home directory.
cd #HOME
vim .bashrc